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Abstract 

Altogether few protein oligomers undergo a conformational transition to a state that impairs their function and leads to 
diseases. But when It happens, the consequences are not harmless and the so-called conformational diseases pose serious 
public health problems. Notorious examples are the Alzheimer's disease and some cancers associated with a conformational 
change of the amyloid precursor protein (APP) and of the p53 tumor suppressor, respectively. The transition is linked with 
the propensity of p-strands to aggregate into amyloid fibers. Nevertheless, a huge number of protein oligomers associate 
chains via p-strand interactions (intermolecular p-strand interface) without ever evolving into fibers. We analyzed the layout 
of 1048 intermolecular p-strand interfaces looking for features that could provide the p-strands resistance to conformational 
transitions. The interfaces were reconstructed as networks with the residues as the nodes and the interactions between 
residues as the links. The networks followed an exponential decay degree distribution, implying an absence of hubs and 
nodes with few links. Such layout provides robustness to changes. Few links per nodes do not restrict the choices of amino 
acids capable of making an interface and maintain high sequence plasticity. Few links reduce the "bonding" cost of making 
an interface. Finally, few links moderate the vulnerability to amino acid mutation because it entails limited communication 
between the nodes. This confines the effects of a mutation to few residues instead of propagating them to many residues 
via hubs. We propose that intermolecular p-strand interfaces are organized in networks that tolerate amino acid mutation to 
avoid chain dissociation, the first step towards fiber formation. This is tested by looking at the intermolecular p-strand 
network of the p53 tetramer. 
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introduction 

There exist proteins which function as oligomers by associating 
several copies of the same chains (homo-oligomers) or of different 
chains (hetero-oligomers). Chain association takes place through 
the formation of protein interfaces involving interactions between 
atoms of the amino acids of adjacent chains. Such intermolecular 
amino acid interactions are extensively studied by both experi- 
mental and computational approaches [1-5]. Alanine scanning 
mutagenesis have showed that only some of the amino acids of the 
interface account for the binding free energy [6] . Thus, there exists 
a subset of amino acids at interfaces, referred to as "hot spot" 
amino acids which are relevant for the chain association. This 
discovery has led to ample computational tool development aimed 
at identifying hot spots. The amino acids essential for interface 
formation are now known colloquially as hot spots, without 
necessarily implying alanine scanning validations. 

Among proteins, some have the fold plasticity to undergo a 
transition from one oligomeric state to another. Of particular 



interest are the cases where the new oligomeric state impairs the 
protein function and leads to pathologies called protein misfolding 
diseases or conformational diseases. This transition is responsible 
for severe human diseases such as Alzheimer (AP-amyloid), 
Parkinson (synuclern) and cerebral amyloid angiopathy (cystatin 
C-amyloidosis). It is important to emphasize that the phenomenon 
is not restricted to neurodegenerative diseases but extends to 
cancer (p53), type II diabetes (lAPP, amylin), cardiovascular 
(transthyretin, serpin) and inflammatory diseases (Serpin) (re- 
viewed in [7-1 1]). Note that in the previous sentence, for each of 
the diseases the protein undergoing the transition is indicated in 
brackets. A priori, these diseases are unrelated and the protein 
culprits do not share biological function, primary, secondary, 
tertiary or quaternary structures (initial or final). So the occurrence 
of the transition ought to be related to a local fold plasticity that 
allows transitions between different oligomeric states. It could be 
secondary structure plasticity as observed for the Dili loop of 
pore-forming toxins which becomes a P-hairpin and promotes the 
toxin's oligomerization or tertiary structure plasticity like the 
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Figure 1. Illustration of the Gemini procedure on a trivial example. A. Interatomic distances between chain 1 and chain 2. On each chain, 
atoms are indicated by small filled circles labeled with letters. For clarity, only a few of the interatomic distances are indicated by dotted lines. B. 
Closest atoms. For every atom of S,, Gemini chooses the closest atoms on S2 (left picture) and for every atom of S2, Gemini chooses the closest atoms 
on S, (right picture). The closest atoms are encircled. C. Mutually closest atoms. Gemini selects the atoms mutually the closest. The amino acids to 
which the mutually closest atoms belong are indicated by big filled circles. R stands for residue and the subscript is the position of the amino acid on 
the sequence. D. Gemini graph of amino acids in interaction. The distances between amino acids in contact are now arbitrary fixed to the same value 
because the information on the "real" interatomic distances is now lost. The pair of residues R99 and R25 is a single pair of amino acids (k=^, that is 
one link connecting two residues). The residue R96 is a multiple contact amino acid because it is involved in two single pairs one with R29 and the 
other with R27, respectively. 
doi:1 0.1 371 /journal.pone.0094745.g001 



movement of the so-called "hinge loop" which leads to the 
formation of dimer or higher oligomeric states via a domain 
swapping mechanism [12-15]. 

The involvement of a local fold in the transition is in good 
agreement with the presence of a common structural motif in the 
pathological form of the culprit proteins. The pathological form, 
whether a fiber or an oligomer, involves interactions between two 
P-strands, each provided by a different chain (intermolecular IB- 
strands). These intermolecular |3-strands share several structural 
properties. They are recognized by the same antibody All [16]. 
Their formation depends on interactions between atoms of the 
backbone, result which has led to the proposal that aggregation is a 
generic property of the polypeptide chain [17,18]. They adopt a 
cross P structure which can be predicted from sequences by the 
PIRA (Parallel 'In Register' Arrangement) model, a network made 
of single pairs of residues [19-24]. Different predictors of the 
aggregation-prone sequences involved in the fiber formation are 
now available [25-30]. 

Nevertheless, intermolecular f5-strands are common in protein 
oligomers that are not known to undergo a transition to 
pathological assemblies. This suggests that there is a protection 
mechanism that prevents some intermolecular fi-strands from 
undergoing the transition. We are interested in identifying the 
features pertaining to the vulnerability of intermolecular P-strands 
to undergo a transition to pathological assemblies. The intermo- 
lecular P-strand interactions that occur in conformational diseases 
are often referred to as "aberrant" interactions because they lead 
to a loss of protein function and finally to the disease while the 
intermolecular P-strand interactions that occur in "healthy" 
protein oligomers are referred to as "functional" interactions. 

Previous studies mainly in dimers have shown that the 
frequencies of individual amino acids in intermolecular P-strands 
and in intramolecular P-strands are not different [31]. Yet we have 



reported that intermolecular P-strands of oligomers of quaternary 
structures above dimer, have a scattered charge distribution in 
contrast to intramolecular P-strands and "aberrant" P-strands 
which have charges confined to their C- and N-terminal 
extremities [26,32,33]. Edge P-strands have charges centrally 
located which prevent their aggregation, explanation that holds for 
intermolecular P-strands as well [34]. In our study, the individual 
hot spots did not have any features that could account for a 
transition from "functional" to "aberrant" P-strand interactions. 
Because of the small size of the dataset (40 intermolecular P- 
strands), it was not possible to investigate the properties of the hot 
spot pairs or of the layout of the interactions between hot spots. 

We have now built a larger dataset of 1048 intermolecular P- 
strands enabling us to explore such properties. The results show 
that the hot spots are not matched randomly but according to 
chemical and geometrical properties of the side chains of the 
amino acids. The role of the geometry is novel and might open 
new venues to apprehend how intermolecular P-strands are 
formed. The main result is that the interactions between hot spots 
are organized to resist to the effects of amino acid mutation, 
possibly avoiding in this way chain dissociation upon mutation, 
first step to fiber formation. 

Results 

The goal is to describe features of the hot spots involved in 
intermolecular P-strands and to consider how they may participate 
in a transition from "functional" to "aberrant" interactions. The 
intermolecular P-strands are represented as networks of hot spots 
in interaction with hot spots as nodes and interactions as links. 
Vocabulary related to graph and network theories are provided in 
methods. 
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Figure 2. General features of the dataset. A. Histogram of the 
lengths (number of amino acids) of the whole chains (black bar) and of 
the intermolecular p-strands (white bar). B. Histogram of the number of 
hot spot pairs in the intermolecular p-strands (white bar) and in the 
whole interface (black bar). The inset is a box of the number of amino 
acid pairs in the intermolecular p-strands (quartile distribution). The 
values within the box (interquartile) represent 75% of the dataset. The 
points above the third quartile Q3 (outside of the box) are p-interfaces 
whose number of amino acid pairs deviates significantly from the rest 
of the dataset. 

doi:1 0.1 371/journal.pone.0094745.g002 
The tool Gemini 

The nodes and the hnks of the networks are identified by our 
tool Gemini, Gemini has been d(;si ribed previously, hence we only 
briefly recall how the networks are built [35,36]. Each chain of a 
protein oligomer is considered as a set of points in the space whose 
positions are the Cart(-sian coordinates (x, y, z) of the atoms of the 
chain. The coordinates can be downloaded from the PDB. The 
atoms of the chain 1 constitute the set 1 (Si) and the atoms of the 
chain 2, the set 2 (S2). Gemini calculates distances between every 
atom of Si and every atom of Sj (interchain distances) but ignores 
the distances between atoms of a single set (intrachain distances) 
(Fig. lA). Gemini chooses the closest atoms (Fig. IB), and among 
them, retains only the pairs of mutually closest atoms (Fig. IC). In 
other words, Gemini starts from an atom Xi of Si and walks to its 
closest atom X2 on Sj. It checks when coming back to Si by the 
shortest distance that it retraces its step to Xi. If not, the pair of 



atoms (Xi, X2) is discarded, as for example for the pair (Ai, Bg) on 
figure IC. The pairs of atoms that are mutually closest are 
considered to be interacting. At this stage the interchain 
interactions are symmetrical and the interface is referred to as 
around symmetrized [35]. In the last step, the pairs of atoms are 
replaced by their respective amino acids and a coarse-grained 
graph of amino acids in interaction is produced (Fig. ID). Every 
amino acid has k interactions or k links where k equals to the 
number of atoms involved in a contact. There are single pairs of 
amino acids [k=\, that is one link connecting two residues), 
multiple pairs of amino acids [k links connecting two residues) and 
multiple contact amino acids (an amino acid with k links to distinct 
amino acids). 

Due to the choice of only mutually closest atoms, Gemini 
produces a graph of amino acids in interaction which is essentially 
a framework of interactions but not the set of all possible 
interactions. The amino acids selected by Gemini are detected as 
hot spots by available programs showing the robustness of defining 
an interface based only on geometry and its accuracy in picking up 
relevant amino acids [35]. It is important that Gemini does not 
need a cut-off distance to select atoms of the interface as classically 
done, for example to select preferentially backbone or side chain 
atoms. In this way Gemini a\'oids tlu' variability of the selection 
inherent to the choice of a cut off [37]. Gemini naturally selects 
backbone and/or side chain atoms as part of the interface 
according to the geometry of the interface. Note that Gemini is 
applicable on any set of points in any metric space and can be used 
beyond the problem in question in the paper. 

The dataset 

The PDBs of 755 protein oligomers containing at least one 
intermolecular P-strand interface are extracted from the RCSB 
(Biological assembly) and in total 1048 intermolecular fi-strand 
networks are constructed with Gemini. It is a non-redundant 
dataset of oligomers assembling three (trimer) to twelve subunits 
(dodecamers). The oligomers are selected only on the presence of 
intermolecular P-strands since we are looking for elements relevant 
to the formation of the interface but not to the formation of the 
whole chain. To fit that condition and alleviate the pressure of 
evolution due to fold or function similarities, we need a dataset 
with high diversities in terms of the features of the whole chains. 
The 755 protein oligomers classify into 234 SCOP families, 30 
distinct fiinctions, are produced by organisms from the three 
domains of life and have on average a fuU chain length of 
206 ±140 amino acids (average ± standard deviation) [38-40]. 

Now, on the contrary, we need a narrow diversity in terms of 
the features of the intermolecular P-strands to give evidences of a 
common construction mechanism. The average length of the 
intermolecular P-strands is 18±13 amino acids, length calculated 
as the sum of the amino acids of the two P-strands. The 
distribution of the whole chain lengths is broader than that of the 
P-interface lengths (Fig. 2A). The intermolecular P-strands have on 
average 13±8 hot spots, 75% have less than 16 hot spots and 25% 
have between 30 and 77 hot spots. Likewise, there are on average 
12±8 hot spot pairs per interface, 75% of the interfaces have less 
than 15 hot spot pairs while 25% have between 25 to 50 (Fig. 2B, 
inset). The number of hot spot pairs in the intermolecular P- 
strands is compared to the total number of hot spots pairs over the 
whole interfaces to assess the diversity of intermolecular P-strands 
in terms of the number of interactions necessary to build them. 
The distribution of the number of pairs in intermolecular P- 
strands is narrower than in the whole interface (Fig. 2B). Globally, 
75% of the dataset have intermolecular P-strands sharing features. 
Moreover, there is no correlation between the length of the whole 
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Table 1. Whole chain amino acid and individual hot spot 
frequencies. 



Amino acid 


Whole chain 


sc 


BB 


A 


0.086 


0.037 


0.057 


C 


0.012 


0.009 


0.016 


D 


0.057 


0.048 


0.034 


E 


0.071 


0.066 


0.052 


F 


0.039 


0.057 


0.053 


G 


0.078 


0.032 


0.081 


H 


0.023 


0.033 


0.021 


1 


0.064 


0.074 


0.096 


K 


0.058 


0.053 


0.050 


L 


0.088 


0.080 


0.081 


M 


0.018 


0.024 


0.024 


N 


0.038 


0.043 


0.032 


P 


0.045 


0.040 


0.012 


Q 


0.033 


0.040 


0.037 


R 


0.051 


0.062 


0.050 


S 


0.056 


0.062 


0.061 


T 


0.056 


0.075 


0.070 


V 


0.083 


0.090 


0.109 


W 


0.011 


0.019 


0.013 


Y 


0.032 


0.056 


0.050 
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chain and the length of the intermolecular P-strands (not shown, 
R = 0.03) supporting the idea that the two objects have indepen- 
dent features. 

The dataset contains 568 anti-paraUel ^-sheets, 132 parallel IB- 
sheets and 348 other P-strand arrangements (close packed P- 
strands) and 60% of the cases have P-strands with distinct 
sequences. One can already anticipate that the intermolecular P- 
strands of the dataset cannot be predicted based on a network of 
pairs of residues following a Parallel In Registered Arrangement 
(PIRA) because only 12% are parallel P-sheets and most P-strands 
have non identical sequences. The global features already 
highlight a network arrangement different from aggregation prone 
sequences [25]. 



Analysis of the properties of the residues in interaction in 
intermolecular p-strands 

Gemini labels backbone and side chain atoms of the amino 
acids such that it produces two sub-graphs: one involving pure 
backbone interatomic interactions (BB networks) and the other 
involving interactions with at least one atom of the side chain (SC 
networks). We have shown that this distinction is necessary to 
exhibit features of intermolecular P-strands [32]. This is certainly 
related to the involvement of the backbone interactions in the 
hydrogen bond network of the P-sheets while in ot-helices such 
backbone interactions are involved intramolecularly and are not 
interfering with intermolecular interactions. This is in good 
agreement with previous reports that side chain and backbone 
interactions are involved in hydrophobic and hydrogen bonding, 
respectively [1,41,42]. 

First, the properties of the individual hot spots are analyzed. 
Totals of 704623, 10692 and 5950 amino acids are observed in the 
whole chains, the SC and the BB hot spots, respectively. These 
figures give evidences of the reliability of the statistics which 
improves with the size of the sample. The amino acid frequencies 
are indicated in table 1 and used to measure the average chemical 
property, the global (GP) and local propensity (LP) of the amino 
acids (Tables 2 and 3, respectively). As observed previously the SC 
and BB hot spots have average chemical properties similar to the 
amino acids of the whole chains, global propensity and local 
amino acid distribution coherent with sequences made of P-strands 
as well as a scattered charge distribution [32] . Namely high P-sheet 
propensity residues (F, W, Y) are sigiiificantiy more frequent while 
low P-sheet propensity (G and A) are sigiiificandy less [43-45], 
The P-strand extremities are enriched in P-breaker amino acids (P 
and G) while high P-sheet propensity residues are enriched 
centrally (V, L) [46,47]. The charged residues R, K and E are 
enriched at the P-strand extremities whereas H and D residues are 
more frequent centrally when the local preferences of the SC 
charged residues is considered (Table 4). 

Second, the properties of the pair of hot spots in interaction are 
analyzed. Because most of the intermolecular P-strands are not 
made of P-strands with an identical sequence, the occurrences w^j 
and «ja are initially counted but a test calculated over the 
occurrences n„i, and shows that the differences are insignificant 
and so n^i and ni„ occurrences are summed (Tables 5 and 6). The 
test ignored the values for the pair of identical residues for which a 
equals b. There are 10551 SC pairs and 5894 BB pairs, again 
highlighting the reliability of the statistics. The frequencies of the 
hot spot pairs/ii are calculated with equation (1) and shown in the 
tables constituting the figure 3. 



Table 2. Chemical properties of the intermolecular p-strands and of the whole chains (%). 



Cases 


Hydrophobic 


Charged 


Polar 


Whole chain residues 


49±5 


26±5 


25±6 


BB hot spots (all) 


58±27 


19±20 


23±23 


BB hot spots (anti-parallel) 


59 ±26 


19±19 


21±21 


BB hot spots (parallel) 


58±17 


17±15 


25±18 


SC hot spots (all) 


47 ±20 


26±19 


26±17 


SC hot spots (anti-parallel) 


47±21 


26±19 


26±18 


SC hot spots (parallel) 


46±17 


25±16 


29±14 
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Table 4. Local preferences of the charged amino acids in the SC hot spots. 





Charged 


Outer frequency (/o) 


Central frequency {f,) 


fa-fc 


D 


0.16 


0.20 


-0.042 


E 


0.25 


0.25 


0.004 


H 


0.11 


0.14 


-0.033 


K 


0.21 


0.20 


0.018 


R 


0.27 


0.21 


0.053 


Average 






0.000 


S.D. 






0.039 
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a=Y,b=Y 

(nah + nha)/ {nah + nba) (1) 

a=Y,b=Y 

The ratio ^ablja-fb) is measured to compare observed values^i with 
expected values 'Ja-fb] (Tables 7 and 8). If the frequency Ja is 
independent of the frequency ^ the ratio is equal to one. Overall 
the hot spots are not matched randomly since 70% and 66% of the 
BB and SC pairs, respectively, have a ratio that deviates from one. 
It is therefore necessary to measure the pair frequencies because 
they cannot be simply derived from the frequencies of individual 
hot spots. 

To evaluate if the distinction between SC and BB hot spots is 
also relevant at the level of the pairs, the frequencies of the SC 
pairs are plotted against the frequencies of the BB pairs (Fig. 4). 
On the diagonal, there are 50 pairs out of a total of 210, thus 
indicating that 76% of the BB and SC pairs have different 
frequencies. It is therefore important to investigate them 
separately. Subsequent analyses are performed using quartUes to 
take into account the observation that 75% of the intermolecular 
P-strands share similar global interface features while 25% are 
more heterogeneous The amino acids with the highest 25% pair 
frequencies (> quartUe Q;j) are considered as preferred contacts 
(Fig. 3, red) whereas those with the lowest 25% pair frequencies (< 
quartUe Qj) are considered as avoided contacts (Fig. 3, green). The 
neutral contacts have the frequencies between Qj and Q3 (Fig. 3, 
white). The Q;^ and Qj of the SC hot spot pairs are 6.0 x 10~^ and 
2.2 X 10 ' , respectively. The and Qj of the BB hot spot pairs 
are 6.7xl0~^ and 1.7x10"^, respectively. In both networks, on 
average every amino acid pairs with 5 other types of amino acids 
out of its twenty pairing possibilities. The most preferred contacts 
are measured as amino acids which pair with a frequency above 
with more than five other types of amino acids. For both 
networks, the most preferred contacts are I, L, V, S and T 
similarly to what was found for intermolecular |3-strands in dimers 
[31]. On the other hand compared to the dimers F and Y residues 
are preferred in the SC networks while A and G are preferred in 
the BB networks, the residue E is preferred in both. Likewise, the 
most avoided contacts are measured as amino acids which pair 
with a frequency below Qj with more than five other types of 
amino acids. For both networks, the most avoided contacts are 
with C, M, W and H residues, similarly to intermolecular IB- 
strands in dimers. In addition contacts with A, G and are 
avoided in the SC networks while contacts with N and residues 
are avoided in the BB networks. 



The features of the hot spots pairs are then analyzed considering 
the chemistry and the geometry of amino acids (Tables 9 and 10, 
respectively). Both SC and BB hot spot pairs have similar 
tendencies for contacts with hydrophobic residues but the contacts 
with polar and charged residues are twice more frequent in the SC 
pairs. Even more blatant differences are the contacts between two 
charged residues, or between two polar residues or else between 
one charged and one polar residue, at least ten times more 
frequent in the SC networks. Considering geometrical properties 
(length of the side chains) the contacts with long and medium 
residues are significantly more frequent in the SC pairs than in the 
BB ones which on the contrary favor contacts between short side 
chain residues. 

Third, the number of contacts of the hot spots is counted to 
determine whether the hot spots have multiple contacts. The BB 
networks have as many single contact hot spots (2941) as two 
contact hot spots (2993) but very little three contact hot spots (12). 
The degree distribution P{k) is equal to the ratio of the number of 
hot spots with k contacts to the total number of hot spots. For the 
BB networks, P(k) has a bell-like shape with an average <k> 
contacts equals to 1 .5 (Fig. 5A). On the other hand, P(k) for the SC 
networks falls on a straight line when plotted on a linear-log scale 
indicating an exponential decay, a variation from the power law 
distribution observed for real networks [48] (Fig. 5A, = 0.99). 
The average <.k> contact of the SC hot spots is 1.4. 

To determine a prototype intermolecular fi-strand network, we 
use a binomial model with 9 amino acids per strand, 6 hot spots 
and the probability = 0.16 of having a contact (see methods for 
definition of a binomial law). These values are based on the 
averages of 1 8 amino acids, 1 2 hot spots and 1 0 links per interface 
measured over the dataset. A fuUy connected graph of 9 amino 
acids per strand (all amino acids have at least one link with all 
others) would have 8 1 links (9 by 9) and so in total on the dataset 
84888 links. Only 13628 links are measured, thus the probability/) 
of making a contact (having a link) is equal to 0.16 (13628/84888). 
Assuming that the amino acids have a uniform distribution of links 
(i.e. all amino acids have the same probability of making a link), 
the binomial model calculates a prototype network with 21% of 
non-connected amino acids (not hot spots), 36% of amino acids 
with one contact and 43% of amino acids with more than one 
contact, 27% of amino acids would have two contacts and 12% 
would have three. The observed data indicate 49% of amino acids 
with one contact and 51% amino acids with more than one 
contact, 33% with two, 14% with three and 4% with more than 3 
contacts. The observed data are measured on hot spots only and so 
do not take into account the non-connected amino acids. In the 
binomial model, the "hot spots", namely the amino acids with a 
link are 79% (36% with one contact and 43% with more than 
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BB 

A 
C 
F 
S 
1 
L 
M 
F> 
V 
W 
D 
E 
H 
K 
R 
N 
Q 
S 
T 



SO 
A 
C 
F 
G 

L 

M 
P 
V 
W 
D 
E 
H 
K 
R 
N 
Q 
s 

T 

Y 



M 



W 



H 



B 
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Figure 3. Tables of the fgi, pair frequencies. A. Observed BB pair frequencies. B. Observed SC pair frequencies. The frequency fab is for pairs of 
hot spots ab read on the lines a and the columns b. The preferred (>Q3) and avoided (<Qi) pairs are indicated by red and green color, respectively. 
The pairs with a frequency betw/een Q, and Q3 are not colored. The residues are ordered alphabetically within hydrophobic, charged and polar 
groups. C. SC and BB pair distinction. The ratios of the frequency of a pair ab in the SC sub-networks to its frequency in the BB sub-networks are 
indicated. The pairs more frequent in the SC sub-networks are indicated in red (ratio >1.2) and the pairs more frequent in the BB sub-networks are 
indicated in green (ratio <0.8). For ratio ranging from 0.8 to 1 .2, the pairs are not colored. The abbreviation n.a. stands for "not applicable" which is 
division per zero, those pairs are more represented in the SC sub-networks. 
doi:l 0.1 371/journal.pone.0094745.g003 



one). The percentage of amino acids with k contacts over a 
network made only of hot spots can be estimated for the binomial 
model by multiplying the calculated values by a factor 100/79. 
That produces 46% of hot spots with one contact (36 * 100/79), 
54% (27*100/79) of hot spots with more than one contact, 34% 
(27 *100/79) with two, 15% (12 *100/79) with three and 5% of 
hot spots with more than three contacts in good agreement with 
observed values. 

We then looked whether the hot spots had unusual amino acid 
features according to their number of contacts. The frequency of a 
hot spot in multiple contacts is divided by its frequency in single 
contact to measure the amino acid propensity to have multiple 
contacts. This propensity is plotted against the respective number 
of atoms of the side chain. No correlation is found for the BB hot 
spots (not shown, R = 0.41) and only branched residues V, I and L 
have a higher tendency of making two interactions suggesting that 
they are enriched in intermolecular P-strands involving parallel (3- 
strands. On the other hand, there is a good linear correlation for 
the SC hot spots (Fig. 5B, R = 0.8). Thus, the propensity of the SC 
hot spot to make contacts is proportional to the number of its side 
chain atoms. Lasdy, the probability of having hot spots with more 
than three contacts (i>3) is plotted against the number of atoms 
and compared to the probability of having a hot spot with one 
contact only (Fig. 5C). The probability of having hot spots with 
more than three contacts increases with the number of atoms 
whereas the probability of single hot spots distributes around a 
probability equal to 0.05. This probabiUty (1/20) implies identical 
chance for all amino acids to have a single contact indicating no 
amino acid specificity for such contact number. On the other 
hand, only residues with more than 1 4 atoms (F, Y, R and W) have 
a probability above 0.05 to make more than three contacts, with 
the exception of the residue K. 

Discussion 

The analysis of the individual hot spot properties confirms a 
scattered charge distribution on the P-strands, high P-sheet 
propensity residues enriched centrally and more particularly 
branched side chain residues (V and L). This indicates that linear 
information, namely the information read on the sequence of the 
P-strands, codes essentially for solubility and regulation of the 
secondary structures. 

Discriminating SC and BB interactions is again relevant at the 
level of the pairs as the SC and BB pair preferences diverge 
significandy. The ratio of SC and BB hot spots and the ratio of SC 
and BB pairs are on average around 2, indicating that the SC 
preferences are likely to have more influence over the intermo- 
lecular P-strands. One novel observation is that the pair matching 
is not only based on the chemistr)- of the amino acids but also on 
their geometry as seen in the preferences for long or charged 
residues in the SC pairs and for small or hydrophobic residues in 
the BB pairs. There is even enrichment in pairs combining amino 
acid properties such as pairs between long and charged residues or 
pairs between long and polar residues. In both SC and BB pairs, 
the branched residues V, I and L are preponderant contacts. A 
chemical-centric view for the pair matching is obviously ill- 



appropriate and in fact the pairing calls upon the versatile 
properties of amino acids. It might be interesting to explore the 
role of geometrical parameters on the formation of intermolecular 
P-strands, experimentally and theoretically. For instance, one 
theoretical approach would be to use Minimum Steiner trees 
which olfer a purely geometrical description of the amino acids, to 
determine whether the pair matching yields a minimum energy 
conformation of the interface [49]. Contacts between identical 
residues represent only around 10% of the total preferred contacts 
indicating a minor role in the matching process. This differs from 
previous report on dimeric intermolecular P-strands and from the 
prediction by a PIRA model [25,31]. The data show that the 2D 
information, namely the amino acid pairing is not random and is 
important for the intermolecular P-strands, not surprisingly since 
P-strands are not viable without making interactions with another 
structural element. 

Now the SC and BB networks do not differ only by their amino 
acid pairing but also by distinct network features. The BB 
networks have nodes with single or two contacts probably 
reflecting the hydrogen bond networks of anti-parallel (single 
contact) and parallel P-sheets (two contacts), respectively. The BB 
networks would essentially code for secondary structure interac- 
tions. The SC networks follow an exponential d(x ay degree 
distribution and have nodes with one, two or three contacts but 
rarely with more than three. Thus the intermolecular P-strands 
result from the juxtaposition of two networks and the information 
for making the interface is encoded via a double layer of 
interactions. One layer is composed of the BB atoms and provides 
promiscuous interactions, namely low specificity in terms of amino 
acid composition and interaction motifs. The second layer is 
composed of the SC atoms which on the contrary provide selective 
interactions, high specificity in terms of amino acid composition 
and interaction motifs. Such type of double layer of interactions 
has been depicted for the interfaces between colicins and their 
immunity binding proteins as a way to evolve binding aflinity [50] . 
There is also a precedent describing monomeric proteins and 
intramolecular amino acid interaction networks [51]. One 
network, based on short range interactions between Ca, had a 
bell curve degree distribution (random network feature) whilst the 
other based on long range interactions (side chain atoms) had an 
exponential decay degree distribution (single-scale network 
feature). 

The exponential decay degree distribution likely fits a network 
optimized to reduce the number of links, relevantiy because it costs 
to make a chemical link. Moreover, the data shows that above 
three contacts there is a strong stringency on the choice of the 
amino acids, suggesting that a node with too many links, a hub, 
would seriously decrease the sequence plasticity to successfully 
realize an interface. Intermolecular p-strands ar<; very plastic in 
term of sequence requirement and seem therefore buUt to avoid 
hubs. Hubs are communication devices but also the Achilles' heel 
of the network: a modification of a hub spreads changes within the 
whole network because the hubs are connected to many nodes 
[52]. The propagation of changes upon node modification is called 
network rewiring [53]. The intermolecular P-strand networks 
which lack hubs are likely littie inclined to rewiring because of 



PLOS ONE I www.plosone.org 



8 



April 2014 | Volume 9 | Issue 4 | e94745 



Characteristics of Intermolecular p-Strand Networl<s 





u 


CO 

d 


d 


d 


00 

d 


1^ 

d 


lO 

d 


CO 

d 


LO 

d 


CO 

d 


d 


CO 

d 


d 


1^ 
d 


00 

d 


C50 

d 


uo 
d 


d 


d 


r-s 
d 


LO 

d 




a> 
ra 

oi 
> 
< 


o 

ro 


ro 

CO 
CO 


vo 


CO 
fN 


00 

rN 


vd 


CO 

rN 


VD 
■'T 


VD 
U-i 
rsj 


CO 

n-i 
5" 


rsi 


LO 


vd 
vo 


CO 

uS 
Ov 


uo 
VD 
rsi 


ro 


CO 

VD 
ro 


CO 

ro 
uo 
uo 


ro 

r-s 


Lfi 

CO 

rsi 




1 

5 

© 


r-^ 
<S 
o 


LO 


CO 

vo 


00 
LO 
00 


1^ 

LO 
00 


CO 

ui 
o 




LO 


uo 
rsj 

LO 


rsj 


uo 
rsj 
rN 


rsi 
rv. 




r-s 
o-i 
Ov 


ro 
vo 


CO 

u-i 
O 


ro 


r-^ 
vd 
vo 


uo 

LO 


ro 
Ov 






p 
m 


CO 


w 


LH 
fN 


LO 

vd 
rN 


CO 

vd 


LJ-I 


LO 

ro 
■'T 


u-i 
rsj 


LO 

5" 


vo 


CO 


as 
vo 




CO 

rsi 


uo 
rsj 

ro 


rsj 
ro 


?i 


uo 


CO 

rsi 




>- 


fN 

m 




CO 


m 


LO 
OO 


rN 
rN 


VD 


LO 

l< 
rsi 


u-i 


LO 

vd 
ro 


CO 


LO 




r-s 


uo 

fN 


uo 
rsj 


CO 


uo 
ro 


r-s 


vo 






LO 

Li-i 


o 




m 


m 


lO 


rN 




vo 


uo 

LO 


rsi 




rN 


rN 


UO 




ro 


uo 
CO 




r-s 




> 


LO 

Li-i 


u-i 
vd 


fN 


LO 
00 

rN 


vo 
rN 


CO 
lO 


LO 

CT^ 


o 

VD 


uo 
r< 
rsi 


uo 
d 


uo 
vd 


uo 
vd 
rsi 


rN 


r-s 


uo 
ro 
rsi 


uo 
ro 


uo 
uo 


uo 

as 
as 


LO 

as 


CO 
CO 

ro 




H 


LO 
fN 


u-i 
ro 


LO 


LO 

■=3-" 


av 
rN 


CO 
CO 


LO 


LO 
CO 

rsi 




uo 

CO 

rsi 


uo 


o 

rsi 


^ 


rN 


as 
rsi 


r-s. 
rsj 
rsi 


uo 
uo 


uo 
d 
uo 


uo 
rsj 


uo 

CO 




m 


LO 




LO 

ro 


fN 




ro 


O 


LO 

ro 


uo 

CO 


CO 


LO 


uo 

CO 


rN 


rN 


uo 


uo 
ro 


uo 
ro 
rsi 


r-s 
rsi 


^ 


CO 

d 




GC 


m 


u-i 


o 

fN 


LO 

d 

m 




rN 


LO 


LO 
CO 


uo 


ro 


0^ 


uo 


m 


uo 
rn 
rN 


o 

ro 


uo 


uo 

d 

ro 


rsi 
rsi 


uo 
ro 


rsi 




a 


LO 


^ 


LO 

vd 




LO 


rN 


O 


LO 




vo 


UO 
ro 


uo 
vd 


in 
rN 


as 


m 
rsj 
rsi 


uo 
ro 
rsi 


uo 

d 


uo 
u-i 


uo 


LO 




0. 


LO 

Li-i 


o 




LO 
00 


LO 


to 

CO 


rN 


ro 


rN 


LO 


O 


rsi 


rN 


uo 
rN 


uo 
rN 


rsi 


uo 


rsi 


uo 






z 


o 


o 




rN 


o 


lO 
ri-i 


rsi 


LO 

VD 


rsi 


as 


rs| 




rN 


uo 
vd 


uo 
u-i 


CO 


o 

rsi 


fN 


LO 

vd 


LO 






o 


o 


LO 

ro 


rN 


LO 

Ov 


CO 


o 


CO 


VD 


LO 

ro 


VO 


LO 


o 


m 


uo 

CO 


uo 

CO 




r-s 


uo 


r-s 






o 

ro 


LO 


O 


LO 

o> 


av 


CO 

rN 
ro 


LO 
CO 


LO 

rsj 

LO 


rN 


vo 


UO 


o 


vo 


00 


uo 


uo 


CO 

rsi 


o 
r-s 


uo 


ro 
Lfi 
ro 






O 


in 


LO 


LO 

m 


Ov 


lO 


VD 


LO 

ro 
rsi 


uo 
ro 


rN 


VO 




rN 


uo 
vd 


uo 

as 


uo 
rsj 
rsi 


O 


rsi 


vo 


uo 




_ 


LO 

o\ 

ro 


rsi 


ro 


LO 

rN 


ro 
rN 


in 
r< 
ro 


LO 

Lj-i 


LO 


uo 
rsi 


LO 

uo 


CO 


LO 
CO 


ro 


ro 


rN 


uo 
d 


uo 
rN 


uo 
VD 


CO 


vo 
rN 




X 


CO 


rsi 




LQ 


00 




LO 

r< 


LO 

u-i 


uo 
u-i 




o 


rsi 


no 


o 


uo 


as 


VD 


CO 


,- 


uo 
vd 






rM 


u-i 
i< 


LO 
CO 


O 
rN 


LO 
00 

rN 


lO 




^ 


uo 


rN 
ro 


uo 


LO 

rsj 


uo 

00 


■=3- 


rN 
rsi 


rq 
ro 
ro 


r-^ 
ro 


UO 


uo 


LO 

as 






ro 
fN 


CO 


CO 


LO 

0> 


LO 

Ov 

OO 


rsi 
ro 


LO 


LO 

ro 
rsi 




as 


- 


LO 

as 


rN 


uo 


ro 


ro 


as 
rsj 


VD 
rsi 


ro 


LO 

rsj 




UJ 




o 


o 






ro 

d 

rN 


rsi 


LO 

u-i 
rsi 


ro 


LO 

d 
rN 


uo 
rsj 


uo 
d 


as 


uo 
r-.' 


O 
ro 


VD 
rsi 


rsi 


uo 
rsi 


LO 

rvj 


rsi 




O 


LO 

Li-i 




LO 

ro 


LO 
00 




O 




uo 

CO 


uo 


LO 

as 


LO 

rsj 


rsi 


^ 


ps. 


rN 


uo 
rsj 


uo 
u-i 


uo 




uo 
Lfi 




u 




vo 




o 


av 


r< 




ro 




LO 


O 


O 


o 




uo 


CO 


ro 


uo 
ro 


o 


r-s 




< 


\o 

ro 






LO 

o> 


rN 
rN 


rN 


LO 

d 


uo 
ro 


uo 
vd 


LO 

ro 


uo 
ro 


O 


vo 


rN 


uo 


uo 
u-i 


uo 
rsj 


VD 
rsi 


uo 


uo 
ro 






< 


U 


Q 


LU 




O 


X 










Z 


Q. 


a 






H 


> 




>- 



3 o 
5^ 



3 Si s 

OS*-: 

2 ^ 

; 8 

g J! (u 

o c 

^ o 

£ ^ -S- 

> ^ 2 

'oi ™ E 



PLOS ONE I www.plosone.org 



9 



April 2014 I Volume 9 | Issue 4 | e94745 



Characteristics of Intermolecular p-Strand Networl<s 



> 
< 



< 
u 



ro <— 



>— CO ro 



>— .— rsi 



in m >— 



T— rsi 



rsi rM I— 



u-i CO ■— 



rsi (ji 



0\ 00 \D 



<— O rsi >— >— 



rsi rsi <— Lo Lo >— 



\0 CO tn m 



>— rsi >— >— 



<— T— rsi rs 



rsi >— CO CO CO in 
fN <— rs T— 





LO 






00 


ro 








rsi 


d 


d 


d 


d 


d 


d 


o 


o 


d 


d 


vD 












1^ 








r< 




n-i 


cK 


rM 


00 


>d 


d 


Oi 




rsi 




IN 


0^ 


vo 








VD 




u-i 






ro 


m 


ro 




uo 


VD 






■=3: 


1^ 






\q 


CO 


ro 


rsj 


u-i 


o 


rsj 




CO 




rsj 




d 




ro 


m 


\o 


ro 






0^ 




o 




rsi 


LTl 




rsi 


ro 


ro 


ro 


>^ 


uo 


vD 




IN 




IN 


■=r 


U-) 


rvi 




uo 




ro 


u-i 


d 


u-i 




a\ 




CO 


d 




rsj 


IN 
u-i 




fN 


o 


m 


vO 
ro 


in 


CO 

uo 


VD 


r-* 


u-i 


ro 


rsj 


CO 


00 






ro 




ro 


CO 


l< 


u-i 


uS 


uS 


d 


oi 


oi 


VD 


vd 


m 


LO 










ro 


ro 




uo 


CO 








00 








CO 


rq 


rn 




ro 


rsi 




OO 


O 


u-i 


rsj 






rsi 








ro 






rsi 


rsi 




















CO 


vD 




ro 


rv. 






ro 


rq 


u-i 




CO 


■=r 


l< 


yo 




U1 


oi 


u-i 


u-i 




rsi 




rsi 


rsi 




fN 


ro 




VD 








C^ 


































m 


LO 


CO 




ro 






u-i 




VD 




























rN 


rs 


LO 










IN 




u^ 


ro 




rvi 


ro 




u-i 






























rsi 


00 






vO 


vD 




rsi 


ro 




ro 


rN 




in 




































Wt 












fN 


rsi 




ro 


00 


U-) 


rs| 


rsi 


rsi 


ro 


u-i 


















rq 






rsi 


uo 




rN 
















CO 




00 




rsi 


ro 












00 






























rsi 






m 


00 


oo 


rsi 


ro 


ro 


ro 




























CO 








CO 


CTi 








ro 






U-i 


^ 


l< 


l< 


ro 






























































rsj 










rs 


rM 


rsi 


uo 














































rsi 




rsi 






rsi 


ro 


ro 


ro 






















w 




in 


O 


od 




^ 


2 




vO 


rsi 


CO 


rsi 


rsi 






rM 


ro 


ro 








u-i 




































rsi 


n-i 








rM 


(N 


ro 


rsi 


























u^ 






























rsi 


rsi 












































rsi 




rsi 


rsi 






rsi 


rsi 




VD 














'=r 




























0^ 


Lfi 






ro 




O 




vD 


CO 




rsi 




ro 








u-i 


ro 


ro 
























ro 




uS 






od 




vD 


ro 


vD 


rsi 




rsi 






CO 


ro 
























uq 


m 


in 


u^ 






U-) 




uo 


CO 










n-i 




ro 


oi 






CO 








rsj 




ro 






rq 


d 


ro 




i< 




U1 


rii 


u-i 


rsj 


u-i 




rsi 








CO 






rsi 


rsi 








Z 


Q. 


o 






H 


> 



T— ro u-1 



<- ^ O 



3 O 

1^ 



o > ^ 

)r r Ln 

^ o ^ 

; 8 

g J! (u 

o c 

^ o 

£ ^ -S- 

> ^ 2 

'oi ™ E 



PLOS ONE I www.plosone.org 



10 



April 2014 I Volume 9 | Issue 4 | e94745 



Characteristics of Intermolecular p-Strand Networl<s 



Table 7. Ratio of fab/(fa-fb) for the BB hot spot pairs. 
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Table 8. Ratio of fab/(fa-fb) for the SC hot spot pairs. 
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Frequency of BB hot spot pairs (log) 

Figure 4. Comparison of the 210 frequencies of tKie BB and SC 
hot spot pairs. The frequencies of the SC hot spots pairs are plotted 
against those of the BB hot spots pairs, both in log scale. Pairs with 
identical BB and SC frequencies are on the diagonal. Pairs more 
frequent in SC are found above the diagonal whereas pairs more 
frequent in BB are found below the diagonal. 
doi:10.1371/journal.pone.0094745.g004 

their low interconnectediiess. Counterintuitively, the robustness of 
intermolecular P-strands would appear based on a weak occur- 
rence of links maintaining high sequence plasticity, cutting costs in 
term of links and reducing their vulnerability to changes 
(mutation). 

It is tempting to speculate that a higher number of links is one of 
the necessary conditions to have a transition from "functional" to 
"aberrant" intermolecular P-strands. It is possible that "healthy" 
protein oligomers which become pathological fibers have inter- 
faces with more links per nodes and networks more sensitive to 
rewiring than those which do not form fibers. To examine such 
possibility, the tumor suppressor p53 tetramer (PDB ISAK, 
fig. 6A), a known case of healthy oligomer undergoing a transition 
to a fiber is considered. First, the Gemini graph of the WT p53 is 
generated (Fig. 6B). The greater occurrence of multiple contact 
residues is striking in the WT p53 network, supporting the 
hypothesis. The p53 hot spots have on average <k> = 3 contacts, 
twice the <k> value of the intermolecular P-strand networks. The 
p53 network has 33% hot spots with more than three contacts 



which is 6 times more than the prototype network. On the other 
hand, it has 25% of single contact hot spots twice less than the 
prototype network. Consequently the interconnectedness is larger 
in the p53 network than in the prototype network. 

To look at the sensitivity of the p53 network to single point 
mutation, the G334V mutant, a familial mutation that leads to the 
dissociation of the p53 tetramer, misfunctions of the protein and 
cancer development, is considered [54]. The Gemini graph of 
G334V is generated and network rewiring is investigated (Fig. 6C). 
The mutation has a strong global effect on the network as aU the 
residues of the p53 intermolecular P-strands from 324 to 334 have 
their links modified by the mutation even when they are not 
directly linked to the residue 334. The modifications are either: (i) 
vanishing of the links (e.g. D324, G325), (ii) changes of the type of 
links such as side chain to backbone (e.g. 1332, L330), (iii) decrease 
of the number of contacts (e.g. Q331, T329) or else (iv) changes of 
contacts (R333). The changes in the network are not limited to 
residues of the intermolecular P-strands but extend to interactions 
between residues that belong to a-helices. This definitely shows 
that there is significant network rewiring in p53 due to a single 
node modification, the mutation of the residue G334, again 
supporting the hypothesis. Mutation of other p53 residues such as 
T329A or Q331A also leads to similar network rewiring (not 
shown) which therefore cannot explain the capacity of the mutant 
G334V to form a fiber, because the T329A and Q,331A mutants 
do not make to fiber [54]. The extent of the changes in the 
network might be such that the intermolecular P-strand interac- 
tions are destabilized promoting chain dissociation, the first step to 
fiber formation. 

Conclusion. The key results are: (i) little information is 
accessible from individual amino acids (i.e. in sequences) and it is 
the pairs of amino acids that need to be investigated, (ii) the 
geometry of the amino acid side chains, so far neglected, is a key 
parameter to understand pair matching and finally (iii) intermo- 
lecular P-strands need to be further explored in terms of networks. 
The intermolecular P-strand networks are rather disconnected 
networks with no hubs but nodes with few links instead. Such a 
layout has several advantages as aheady discussed but probably 
the most relevant one is the secluding characteristic of the network 
which may well serve to limit the spread of changes, namely the 
rewiring, and protect the interface from dissociation upon 
mutation. 



Table 9. SC and BB hot spot pair chennical tendencies. 



Pair property 


Total 


SC tendency 


BB tendency 


Neutral 


(Fhi, X) 


155 


58 


65 


32 


(Ch, X) 


90 


50 


26 


14 


(P,X) 


90 


46 


22 


22 


(Fhi, Fhi) 


55 


24 


23 


8 


(Fhi, Ch) 


50 


19 


23 


8 


(Fhi, P) 


50 


15 


19 


16 


(Ch, Ch) 


15 


12 


1 


2 


(Ch, P) 


25 


19 


2 


4 


(P, P) 


15 


12 


1 


2 



The number of pairs with a ratio SC pair frequency to BB pair frequency above 1.0±0.2 indicates the SC pair tendency. The number of pairs with a ratio below 0.8±0.2 
Indicates the BB pair tendency (table based on Fig. 2C}. The second column, total, Indicates the pair combinatory of the chemical pair property mentioned in the first 
column. Fhi, Ch and P stand for hydrophobic, charged and polar residues. X stands for fhi, ch and P. 
doi:1 0.1 371/journal.pone.0094745.t009 
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Table 10. SC and BB hot spot geometrical pair tendencies. 





Geometrical pair property 


Total 


SC preferred 


BB preferred 


Neutral 


(L, X) 


74 


43 


10 


21 


(M, X) 


144 


72 


41 


31 


(S, X) 


119 


41 


49 


29 


(L, L) 


10 


8 


1 


1 


(L, M) 


36 


23 


5 


8 


(L, S) 


28 


12 


4 


12 


(M, M) 


45 


29 


6 


10 


(M, S) 


63 


20 


30 


13 


(S, S) 


28 


9 


15 


4 



Legend as in table 9. L, M and S stand for long, medium and short side chains, x stands for L, M and S. 
doi:1 0.1 371 /journal.pone.0094745.t01 0 



Methods 

Definitions 

Graph, graph, or a network, is a set of many components that 
interact with each other through pairwise interactions. At a highly 
abstract level, the components can be reduced to a series of nodes 
that are connected to each other by links, with each link 
representing the interactions between two components. The nodes 
and links together form a network, or, in more formal 
mathematical language, a graph [55] . The terms nodes and links 
used in graph theory are amino acids/hot spots and contacts/ 
interactions, respectively, in the present context. The number of 
links of a node is the degree k of the node. In the networks of hot 
spots in interaction, the residues are connected through different 
motifs. Two residues connected by only one link make a single pair 
while two residues connected by more than one link make a 
multiple pair. Hot spots involved in single pair are single contact 
hot spots. Hot spots with more than one individual contact are 
called multiple contact hot spots. 

Global propensity (GP) . The global propensity of an amino 
acid is the ratio of its frequency in a defined environment by its 
frequency in a database. Here the global propensity measures the 
frequency of every amino acid in intermolecular P-strands divided 
by its frequency in the whole chain. 

Local preferences: the local amino acid preferences measure the 
preferred position of every amino acid on the P-strands. It is 
calculated as the difference of the frequency of a hot spot at the P- 
strand extremities (outer position) and its frequency when centrally 
located (any other position) on the strand. 

Chemistry of the side chain of amino acid: charged amino acids 
are D, E, H, R and K; polar amino acids are N, Q, S, T and Y; 
hydrophobic residues are A, C, F, G, I, L, M, P, V and W. 

Length of the side chain of amino acid: long side chain residues 
are K, W, R and K; medium side chain residues are D, N, L, I, H, 
E, M and F and short side chain amino acids are G, A, P, C, S, 
T and V. 

Methods 

Construction of a non-redundant dataset 

The Protein Data Bank (PDB) was first screened at the Research 
CoUaboratory for Structural Bioinformatics (RCSB) for protein 
oUgomers of stoichiometry above 2 and lower or equal to 12 [56]. 
Above dodecamers the number of cases becomes small for 
statistical analysis. Dimers are excluded from the dataset because 



of their diversity of orientation contacts implying broad diversity in 
recognition contact modes [57]. Viral and membrane proteins 
have been removed because they are likely to follow a different 
mechanism of interface formation than soluble oligomers. The 
coordinates of biological assembly were taken to select for non- 
crystaUographic oligomers. NMR and X-ray structures are taken 
into account. PDB entries containing only backbone (BB) atoms, 
or only a few side-chain (SC) atoms, are discarded by monitoring 
the ratio of available SC and BB atoms for each of the twenty 
amino acids. Proteins with sequences similar at 90% identity are 
removed. As a result, 6234 PDBs have been tentatively treated 
with Gemini to describe the whole interface. There is a small 
minority of cases where Gemini stops before yielding the interface. 
Mainly, this is due to the presence of a single subunit in the PDB 
file, while Gemini expects several. This happens even if biological 
assemblies were downloaded from the RCSB. At this point, the 
interface is available for a set of 5248 proteins. Receptor-ligand, 
enzyme-inhibitor, and antigen-antibody types of interactions 
involve different ranges of Ko than permanent oligomers and as 
such are expected to have different recognition modes [42]. 
Therefore they are discarded from the dataset by removing 
proteins having at least one very short chain (^20 amino acids). 
Truncated proteins were also discarded from the dataset by 
selecting only cases having chains less than 20 amino acid different 
in length. 

Using the secondary structure annotation provided in the PDB 
file, the cases with intermolecular P-strands were extracted 
according to the following set of rules (to be simultaneously 
satisfied): 1) at least 3 bonds must be between amino acids 
belonging to P-strands; 2) at least 2 interface amino acids of each 
subunit must be in a P-P bond; 3) at least 5 interface amino acids 
must be classified p. The first rule is actually redundant as it is 
implied by the second and the third. To simplify the treatment, in 
the case of hetero-oligomers with more than one intermolecular P- 
strand, only one, randomly chosen, has been considered. The final 
list has been screened against redundancies by mapping each PDB 
code into a UniProt identifier. This allows using the appropriate 
UniProt algorithms to find and remove redundant cases. After this 
final suppression, we are left with 755 proteins having 1048 
regions of intermolecular P-strands. 

Hot spots in interaction 

A pair of hot spots is made of a hot spot -a- interacting with a 
hot spot -b-. Some hot spots participate in more than one parr at 
the same time and it is necessary to avoid their multiple counting. 
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Figure 5. Number of contacts of the hot spots. A. The degree 
distributions of the BB and SC hot spots are plotted on a semi-log scale. 
The degree distribution P(k) of the SC hot spots decreases exponentially 
(R'^ = 0.99). B. Linear correlation between the number of atoms of a SC 
hot spot and its tendency to have more than one contact. The ratio of 
the frequency of an amino acid in multiple contacts to its frequency in 
single contact is plotted against the number of its side chain atoms. C. 
Probability of a SC hot spot to have k contacts. The probabilities for a SC 
hot spot to have k>3 (♦ ) or k=^ (O) are plotted against the number 
of atoms of its respective amino acid. The horizontal line indicates the 
probability at which every amino acid has the same probability to have 
k contacts (0.05 = 1/20). The vertical line indicates a number of atoms 
equals to 14. 

doi:10.1371/journal.pone.0094745.g005 




352 



R VRIQLTFYE G D 

324 



Figure 6. The p53 intermolecular p-strand network. A. Atomic 
structure of the p53 tetramerization domain (PDB 1SAK). The picture is 
generated with Rasmol, the four chains are shown in different colored 
ribbons. The G334 residue is indicated in spacefill. B. Gemini graph of 
the WT p53 tetramerization domain. The intermolecular p-strands 
composed of the residues 324 to 334, are highlighted by the yellow and 
purple arrows. The vertical arrows point to the residue 334. The links 
and hot spot contacts of G334 are shown by dotted red lines and red 
circles, respectively. C. Gemini graph of the G334V mutant. The hot 
spots whose links are affected by the mutation are underlined in red. 
The changes are not limited to residues in direct contact with G334 or 
to residues of the intermolecular p-strands. 
doi:10.1371/journal.pone.0094745.g006 
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A pair (Al, A2) is counted I /n time with n the number of bonds of 
Al. Let's consider a hot spot G forming a pair with T and another 
pair with L. Each of the (G, T) and (G, L) pairs is counted a half so 
the occurrence of G is equal to one and not to two if the pairs (G, 
T) and (G, L) had been counted one eacli instead of a half This 
counting procedure implies that the tallies of occurrences must be 
read row-wise (Tables 5 and 6). Now, when the number of 
interactions (bonds) issued from a hot spot is counted instead of the 
pair occurrences such normalization is unnecessary. 

Statistical tools 

X^. Wo* s^iid Kj^ pair occurrences. The total observed pair 
occurrences and «ja are calculated for each residue as the sum 
of the occurrences on a row and the sum of the occurrences on a 
column (Tables 5 and 6 for the BB and SC sub-networks, 
respectively). The significance of the differences of the occurrences 
Hab and was assessed using a (equation 2) with one degree of 
freedom calculated as follows: 

E E iOij-EijflEij (2) 

i=A,Yi=A,Y 
J-A,Yj=A,Y 

With Oij the observed occurrences (line i and column j on the 
tables 5 and 6) and Ey the expected occurrences calculated as the 
average value of the total observed pair occurrences ria/, and ni,a- 
The sums are for the and the «4„ occurrence values. For one 
degree of freedom, a x^ value inferior to 3.84 is not significant (5% 
threshold significance). 

Obser\'ed (fa/^ and expected values {faXf/,}- The significance of 
the differences of the observed {f^i) and expected pair frequencies 
Ifa'xfi) was also assessed using a x^ with Oy and Ey the observed 
and expected pair frequencies, respectively. This time it is 
calculated over a matrix where low occurrences (below 5) are 
summed and a p-value is calculated. 
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Virtual mutation 

Fold X is used to generate the virtual mutation G334V in the 
PDB of the p53 tetramerization domain was designed following 
instruction in [58,59]. 
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